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Abstract 

An RDF data shape is a description of the expected contents of an 
RDF document (aka graph) or dataset. A major part of this description 
is the set of constraints that the document or dataset is required to satisfy. 
W3C recently (2014) chartered the RDF Data Shapes Working Group to 
define SHACL, a standard RDF data shape language. We refer to the 
ability to name and reference shape language elements as recursion. This 
article provides a precise definition of the meaning of recursion as used in 
Resource Shape 2.0. The definition of recursion presented in this article is 
largely independent of language-specific details. We speculate that it also 
applies to ShEx and to all three of the current proposals for SHACL. In 
particular, recursion is not permitted in the SHACL-SPARQL proposal, 
but we conjecture that recursion could be added by using the definition 
proposed here as a top-level control structure. 


1 Introduction 

An RDF data shape is a description of the expected contents of an RDF doc¬ 
ument (aka graph) or dataset. A major part of this description is the set of 
constraints that the document or dataset is required to satisfy. In this respect, 
data shapes do for RDF what XML Schema[6] does for XML. The term shape 
is used instead of schema to avoid confusion with RDF Schema[3] which, like 
OWL [9], describes inference rules, not constraints. 

W3C recently (2014) chartered the RDF Data Shapes Working Group to 
define SHACL, a standard RDF data shape language[8]. Both of the member 
submissions to this working group, Resource Shape 2.0[13] and Shape Expres¬ 
sions (ShEx) [15] allow shapes to refer to each other. For example, in Resource 
Shape 2.0 the property oslc: valueShape lets one resource shape refer to an¬ 
other. ShEx has a similar feature. In these languages, a shape may refer directly 
or indirectly to itself. 

We refer to the ability to name and reference shape language elements as 
recursion in analogy with that ubiquitous feature of programming languages 
which allows a function to call other functions, including itself. Of course, 
when writing a recursive function care must be taken to ensure that recursion 
terminates. Similarly, when defining a shape language care must be taken to 
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spell out the precise meaning of recursion. Neither of the member submissions 
included a precise definition of recursion. 

This article provides a precise definition of the meaning of recursion as used 
in Resource Shape 2.0. Precision is achieved through the use of Z Notation [16], 
a formal spedhcation language based on typed set theory. The source 

for this article has been type-checked using the /uzz type-checker [17] and is 
available in the GitHub repository agryman: shape-recursion [14]. 

The definition of recursion presented in this article is largely independent 
of language-specific details. We speculate that it also applies to ShEx and 
to all three of the current proposals for SHACL. In particular, recursion is 
not permitted in the SHACL-SPARQL proposal [10], but we conjecture that 
recursion could be added by using the definition proposed here as a top-level 
control structure. 

1.1 Organization of this Article 

The remainder of this article is organized as follows. 

• Section 2 introduces examples in order to ground the following definitions. 

• Section 3 dehnes a few basic RDF concepts. 

• Section 4 defines neighbour functions and graphs which form the basis for 
the following definition of recursion. 

• Section 5 dehnes constraints. 

• Section 6 dehnes recursive shapes. 

• Section 7 discusses how the proposed dehnition of recursion relates to the 
existing and proposed shape languages. 

• Section 8 concludes the article. 

2 Examples 

This section introduces two examples of recursive shapes. 

The hrst recursive shape describes the data in a Personal Information Man¬ 
agement application. This application is highly simplihed and easy to under¬ 
stand. It is used as a running example to illustrate the formal dehnitions. 
Although this shape is written using recursion, it can be re-written as an equiv¬ 
alent, non-recursive shape. 

The second recursive shape describes what it means to be a Polentoni [11]. 
This shape is also highly simplified but cannot be re-written as non-recursive 
using the Resource Shape 2.0 specification. 


2 



2.1 Example: Personal Information Management 

We use a highly simplified running example to illustrate the concepts defined in 
the following sections. Each formal definition is instantiated with data drawn 
from the running example in order to help the reader understand the formalism 
and relate it to RDF. Although the inclusion of examples lengthens the presen¬ 
tation, we hope that it will make the formalism more tangible and accessible to 
readers who are unfamiliar with Z Notation. 

Consider a Linked Data [2] application for Personal Information Manage¬ 
ment (PIM). The application manages documents that contain information 
about a contact person and their associates. As a Linked Data application, 
the PIM application provides a REST API for creating, retrieving, updating, 
and deleting contact information over HTTP using RDF representations of the 
data. Shapes are useful in this context for two main reasons. First, the PIM 
application may publish shapes that describe the contact information so that 
application developers who want to use the REST API understand the API 
contract. Second, the PIM application may internally use a shape engine that 
automatically validates the data, especially incoming creation and update re¬ 
quests. 

The prehxes rdf : and foaf : as used for terms in the RDF[5] and FOAF[4] 
vocabularies. The application maintains the following integrity constraints. 

• Each document contains information about exactly one contact person and 
zero or more of their associates. A contact person is never an associate of 
themself. 

• Each contact has type foaf: Person and has exactly one name given by 
the property foaf :nEmie. 

• The contact’s associates are given by the property foaf : knows which may 
have zero or more values. 

• Each associate has type foaf: Person and has exactly one name given by 
foaf:name. 

• Each associate is known by exactly one contact given by following the 
property foaf:knows in the backward direction, i.e. the associate is the 
object of the property and the contact is the subject. 

Note that these constraints are circular since the definition of contact refers 
to the dehnition of associate, and conversely. We have an obligation to give this 
circularity a precise meaning. 

These constraints are illustrated by a valid document for Alice (Listing 1) 
and an invalid document for Bob (Listing 2). All RDF source code examples 
are written in Turtle format [1]. 

The following document about Alice satisfies all the constraints of the ap¬ 
plication. 
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1 # http://example.org/contacts/alice 

2 ©prefix foaf: <http://xmlns.eom/foaf/0.l/> . 

3 ©base <http://example.org/contacts/> . 

4 

5 <alice#me> a foaf:Person ; 

6 foaf:name "Alice" ; 

7 f oaf:knows 

8 <bob#me> , 

9 <charlie#me> . 

10 

11 <bob#me> a foaf:Person ; 

12 foaf:name "Bob" . 

13 

14 <charlie#me> a foaf:Person ; 

15 foaf:name "Charlie" . 

Listing 1: Contact document for Alice 

The following document about Bob violates some of the constraints of the 
application. 

1 # http://example.org/contacts/bob 

2 ©prefix foaf: <http://xmlns.eom/foaf/0.l/> . 

3 ©base <http://example.org/contacts/> . 

4 

5 <bob#me> a foaf:Person ; 

6 foaf:name "Bob" ; 

7 f oaf:knows 

8 <alice#me> , 

9 <charlie#me> . 

10 

11 <alice#me> foaf:name "Alice" . 

12 

13 <charlie#me> a foaf:Person . 

Listing 2: Contact document for Bob 

It is clear that the document about Bob is invalid, since Alice has no type 
and Charlie has no name. 

It is also intuitively clear that the document about Alice is valid. However, if 
we naively translate the PIM constraints into logical conditions on the document 
about Alice, then we run into a problem. All the constraints about types, names, 
and who knows who are satisfied and unproblematic, but the constraints about 
what it means to be a contact or an associate are circular. A naive translation 
of these constraints on the Alice document is as follows. 

• If Bob is an associate and Charlie is an associate then Alice is a contact. 

• If Alice is a contact then Bob is an associate. 

• If Alice is a contact then Charlie is an associate. 
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Table 1 introduces propositional variables to stand for statements about 
being a contact or associate in the Alice document. 


Variable 

Meaning 

A 

Alice is a contact. 

B 

Bob is an associate. 

C 

Charlie is an associate. 


Table 1: Meaning of propositional variables in the Alice document 

The PIM constraints on the Alice document translate to the following con¬ 
sistency condition. 

{B A C ^ A) A {A^ B) A{A^ C) 

Unfortunately, this consistency condition does not uniquely determine the 
values of the propositional variables. In fact, this consistency condition has 
several solutions as shown in Table 2. Only the solution in which all the propo¬ 
sitional variable are true agrees with our intuition. 


A 

B 

C 

true 

false 

false 

false 

true 

true 

false 

false 

true 

false 

true 

false 


Table 2: Solutions to PIM constraints in the Alice document 

This analysis shows that the naive translation of the constraints about con¬ 
tacts and associates produces a necessary, but not sufficient, consistency condi¬ 
tion on the meaning of these constraints. A precise definition for this type of 
constraint is given in Section 6. A brief overview of this definition follows. 

The correct interpretation of the constraints is based on the observation that 
they specify two essentially different kinds of information. One kind defines 
rules for labelling nodes with names. The other kind defines a set of conditions 
associated with each name and asserts that these conditions must hold at each 
node labelled with that name. 

In the PIM application, the names are contact and associate. The rules for 
labelling the nodes in a document are as follows. 

1. Initially, no node has any labels. 

2. Start with the node that corresponds to the person that the document is 
about, and label it as a contact. 

3. For each node labelled as a contact, find all the nodes they know, and add 
an associate label to each of them. 
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4. For each node labelled as an associate, find all the nodes that they are 
known by and add a contact label each of them. 

5. Repeat the previous two steps until no new labels are added. 

6. Note that this procedure always terminates because the number of nodes 
is finite and the number of names is finite (2 in this case). 

In the Alice document, the labelling procedure results in the nodes being 
labelled as follows. 

• Alice is labelled as a contact because the document is about Alice and 
Alice is known by Bob and Charlie. 

• Bob is labelled as an associate because Alice knows Bob. 

• Charlie is labelled as an associate because Alice knows Charlie. 

Whenever a node gets labelled with a name, the conditions associated with 
the name must hold. No recursion is involved in this step. 

The conditions that must hold for nodes labelled with contact are as follows. 

• A contact must be a person. 

• A contact must have exactly one name. 

• A contact must not know itself. 

The conditions that must hold for nodes labelled with associate are as fol¬ 
lows. 

• An associate must be a person. 

• An associate must have exactly one name. 

• An associate must be known by exactly one node. 

Although the statement of the PIM constraints uses recursion, the properties 
of the data in this case allow us to write an equivalent non-recursive statement 
[12]. Specifically, since a node is an associate only if it is known by a contact, 
and an associate must be known by exactly one contact, nothing more is gained 
by requiring that all nodes that know an associate must be contacts. Dropping 
this condition removes the recursion. However, in general we cannot convert 
a recursive constraint into an equivalent non-recursive constraint. The next 
example illustrates an essentially recursive constraint. 
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2.2 Example: Polentoni 

Consider the following definition of what it means to be a Polentoni [11]. 

• A Polentoni lives in exactly one place and that place is Northern Italy. 

• A Polentoni only knows other Polentoni. 

The definition of Polentoni refers to itself and is therefore recursive. However, 
we can give it a precise meaning using the labelling procedure described above. 

In this example, the only label name is Polentoni. The labelling procedure 
is as follows. 

1. Initially, no node has any labels. 

2. Start with the node to be checked for being a Polentoni, and label it as a 
Polentoni. 

3. For each node labelled as a Polentoni, find all the nodes they know, and 
add a Polentoni label to each of them. 

4. Repeat the previous step until no new labels are added. 

5. Note that this procedure always terminates because the number of nodes 
is finite and the number of names is finite (1 in this case). 

The condition that must hold for nodes labelled with Polentoni is as follows. 

• A Polentoni must live in Northern Italy. 

Listing 3 contains some sample data. 

1 ©prefix ex: <http://example. 0 rg/polentoni#> . 

2 

3 ex:Enrico ex:livesln ex:Northernitaly . 

4 ex:Diego ex:livesln ex:Northernitaly . 

5 ex:Alessandro ex:livesln ex:Northernitaly . 

6 ex:Sergio ex:livesln ex:Northernitaly . 

7 ex:John ex:livesln ex:Northernitaly . 

8 ex:Maurizio ex:livesln ex:Southernitaly . 

9 

10 ex:Enrico ex:knows ex:John . 

11 ex:John ex:knows ex:Maurizio . 

12 ex:Diego ex:knows ex:Alessandro . 

13 ex:Alessandro ex:knows ex:Diego . 

14 ex:Alessandro ex:knows ex:Sergio . 

Listing 3: Polentoni sample data 

Figure 1 depicts the Polentoni sample data where, for example, the arrow 
from Enrico to John indicates that Enrico knows John. 

Checking Enrico results in Enrico, John, and Maurizio being labelled as Po¬ 
lentoni. However, Maurizio lives in Southern Italy so Enrico is not a Polentoni. 
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Figure 1: Polenti sample data 


Checking Diego results in Diego, Alessandro, and Sergio begin labelled as 
Polentoni. They all live in Northern Italy so Diego is a Polentoni. 

Note that if Resource Shape 2.0 were more expressive then we could rewrite 
the definition of Polentoni to avoid recursion as follows. 

• A Polentoni lives in Northern Italy (and nowhere else). 

• Everyone that a Polentoni knows, directly or indirectly, lives in Northern 
Italy (and nowhere else). 

The price paid for eliminating recursion is that now we have introduced the 
transitive closure of the knows relation, which is beyond the expressive power 
of the Resource Shape 2.0 specification. 

Transitive closure is, however, expressible using SPARQL property paths. In 
fact, all the Polentoni constraints can be expressed by a single SPARQL query. 
Listing 4 contains a SPARQL query that finds all non-Polentoni people in a 
graph, where we assume that a person is any resource that lives somewhere, or 
knows someone, or is known by someone. Note the use of the property path 
ex:knows* which is referred to as a ZeroOrMorePath expression. 

1 

2 

3 

4 

5 

6 

7 

8 

9 

8 


# polentoni.rq 

prefix ex: <http://example. 0 rg/polentoni#> 

# finds all non-Polentoni person nodes ?this in the graph 
select distinct ?this 

where { 

# binds each person node to ?this 

I 








10 


select distinct ?this 

11 


where 4 

12 


4?this ex:livesln Tregionf 

13 


union 

14 


4?this ex:knows Tperson} 

15 


union 

16 


4?person ex:knows ?this} 

17 


} 

18 

y 


19 



20 

# 

binds each person that ?this knows, 

21 

# 

directly or indirectly, to Tperson 

22 

?this exiknows* Tperson . 

23 



24 

# 

A non-Polentoni ?person must not live 

25 

# 

in and only in Northern Italy 

26 

4 


27 


# Tperson lives nowhere 

28 


filter not exists 4?person ex:livesln Tregion} 

29 

}■ 


30 

union 

31 

4 


32 


# Tperson lives somewhere not Northern Italy 

33 


Tperson exilivesin ?region. 

34 


filter (Tregion != ex:Northernitaly) 

35 

}■ 


36 y 




Listing 4: SPARQL query for non-Polentoni people 


Table 3 gives the results of running the non-Polenoni query on the data 
contained in Listing 3. 


this 

http://example.org/polentoni#Maurizio 
http://example.org/polentoni#John 
http: //exEimple. org/polentoni#Enrico 


Table 3: SPARQL query results for non-Polentoni people 


• Maurizio is non-Polentoni because he lives in Southern Italy. 

• John is non-Polentoni because he knows Maurizio. 

• Enrico is non-Polentoni because he knows John. 

One might therefore contemplate avoiding the issue of recursion by adding 
powerful path expressions to the shape language. However, it is unclear that 
path expressions alone are sufficiently powerful to cover all the cases currently 
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expressible in Resource Shape 2.0. Furthermore, even if that were true, trans¬ 
lating recursive references into property path expressions would impose a severe 
burden on the shape author. The use of recursion allows concise and intuitively 
clear descriptions so, as long as recursion can be given a precise definition, there 
is good reason to include in future shape languages. 

3 Basic RDF Concepts 

This section formalizes some basic RDF concepts. For full definitions consult 
the RDF specification[5]. 

3.1 Terms 

Let TERM be the set of all RDF terms. 

[TERM] 

The set of all RDF terms is partitioned into IRIs, blank nodes, and literals. 

IRI, BNode, Literal : P TERM 

{IRI, BN ode, Literal) partition TERM 

For example, the documents for Alice and Bob contain the following distinct 
literals where Alice denotes "Alice", etc. 

Alice, Bob, Charlie : Literal 
disjoint {{Alice}, {Bob}, { Charlie}) 

and the following distinct IRIs where alice denotes http: //example. org/contacts/alice#me, 
etc., rdf-type denotes rdf : type, and foaf-Person denotes foaf: Person, etc. 

alice, bob, charlie : IRI 
rdf-type : IRI 

foaf -Person, foaf -name, foaf -knows : IRI 

disjoint {{alice}, {bob}, {charlie}, {rdf-type}, 

{foaf-Person}, {foaf-name}, {foaf-knows}) 

3.2 Triples 

An RDF triple is a statement that consists of three terms referred to as subject, 
predicate, and object. 

Triple == {s,p, o : TERM \ s ^ Literal A p € IRI} 

• The subject must not be a literal. 
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• The predicate must be an IRI. 

For example, the statement that Alice is a person is represented by the 
following triple. 

h {alice, rdf-type, foaf-Person) G Triple 

3.3 Graphs 

It is common to visualize a triple as a directed arc from the subject to the 
object, labelled by the predicate. A set of triples may therefore may visualized 
as a directed graph (technically, a directed, labelled, multigraph). We are only 
concerned with finite graphs here. 

An RDF graph is a finite set of triples. 

Graph == F Triple 

For example, the following graph contains the triples in the document about 
Alice. 


alice-graph : Graph 
alice-graph = 

{{alice, rdf-type, foaf-Person), 

{alice,foaf-name, Alice), 

{alice, foaf-knows, bob), 

{alice, foaf-knows, charlie), 

{bob, rdf-type, foaf -Person), 

{bob, foaf-name. Bob), 

{charlie, rdf-type, foaf-Person), 

{charlie, foaf-name, Gharlie)} 

Figure 2 depicts the document about Alice as a directed, labelled graph. 

It is convenient to define functions that map graphs to the sets of subjects, 
predicates, and objects that appear in the graph. 

subjects == (A g : Graph • { s,p, o : TERM \ {s, p, o) G g • s}) 
predicates == {X g : Graph • {s,p,o : TERM \ {s,p, o) G g • p}) 
objects == {Xg : Graph • {s,p,o : TERM \ {s,p, o) G g • o }) 

For example, the graph for Alice contains the following predicates. 

h predicates {alice-graph) = 

{ rdf-type, foaf-name, foaf -knows} 

The nodes of a graph are its subjects and objects. 
nodes == {Xg : Graph • subjects{g) U objects{g)) 
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Figure 2: Alice contact graph 


For example, the graph for Alice contains the following nodes, 
h nodes(alice-graph) = 

{alice, bob, charlie, Alice, Bob, Charlie, foaf-Person} 

A pointed graph consists of a graph and a base node in the graph. 

_ PointedGraph _ 

graph : Graph 
baseNode : TERM 

baseNode G nodes (graph) 


• The base node is some node in the graph. 

The base node of a pointed graph is also referred to as the start node or 
focus node of the graph, depending on the context. 

For example, alice is the natural base node of the graph for Alice. 

alicc-pg : PointedGraph 

alicc-pg. graph = alicc-graph 

alicc-pg.baseNode = alice 

• The graph is alicc-graph. 

• The base node is alice. 

4 Neighbour Functions 

RDF applications often impose conditions on nodes, and related conditions on 
their neighbours, where a neighbour is some node that bears a specified relation 
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to the given node. When the neighbour relation between nodes is specified by 
traversing triples, we say that the nodes are connected by a path. SPARQL 

1.1 [7] defines a property path syntax for specifying paths. 

More generally, applications may use neighbour relations that cannot be 
specified by property paths. Many such relations might be specified by SPARQL 
queries that bind pairs of variables to nodes. For maximum generality, we do 
not place restrictions on how neighbour relations are specified. 

A neighbour function is any mapping from graphs to pair of nodes that 
belong to the graph. 

Neighbour : P(Graph (TERM ^ TERM)) 

Neighbour = 

{ q : Graph ( TERM ^ TERM) \ 

(V g : Graph • q(g) C{x,y: nodes(g) }) } 

• A neighbour function is a mapping that maps a graph ^ to a binary relation 
on the nodes of g. 

We say that the pair of nodes (x, y) matches the neighbour function q in the 
graph g when (x, y) G q(g). 

4.1 Simple Path Expressions 

Simple path expressions define a very commonly used type of neighbour func¬ 
tion. 

A predicate p defines a simple path expression forward(p) by traversing 
triples in the forward direction. Forward path expressions are referred to as 
PredicatePath expressions in SPARQL 1.1. 

forward : IRI —> Neighbour 

V p : IRI ; g : Graph • 
forward(p)(g) = 

{ s, o : nodes(g) \ (s,p, o) G g} 

• The simple path expression forward(p) matches all pairs (s, o) such that 
(s,p, o) is a triple in g. 

For example, the following are forward path expressions. 

has-type == forward (rdf-type) 
has-name == for ward (foaf -name) 
knows == forward(foaf-knows) 

The forward path expression has-type matches the following pairs of nodes 
in the graph for Alice. 

h has-type(alice-graph) = 

{( alice , foaf-Person), 

(bob, foaf-Person), 

(charlie, foaf -Person)} 
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Similarly, a predicate p defines a simple path expression backward{p) by 
traversing triples in the backward direction. Backward path expressions are 
referred to as InversePath expressions in SPARQL 1.1. 

backward : IRI —> Neighbour 

V p : IRI ; g : Graph • 
backward {p){g) = 

{ o,s : nodes{g) \ {s,p, o) G g} 

• The simple path expression backward{p) matches all pairs (o, s) such that 
(s,p, o) is a triple in g. 

For example, the following is a backward path expression. 

is-knowri-by == backward{foaf -knows) 

The backward path expression is-knowu-by matches the following pairs of 
nodes in the graph for Alice. 

h is-known-by{alice-graph) = 

{{bob, alice), 

{charlie, alice)} 

4.2 Values 

Given a graph g and a node x G nodes{g), the set of all nodes that can be 
reached from x by matching the neighbour function q is values{g, x, q). 

values : Graph x TERM x Neighbour —> F TERM 

y g : Graph] x : TERM; q : Neighbour • 

values{g, x,q) = {y ■. nodes{g) \ {x, y) G q{g) } 

• The node y is in values{g, x, q) when {x, y) matches q in g. 

For example, in the graph for Alice the node alice and forward path expres¬ 
sion knows have the following values. 

h values{alice-graph, alice, knows) = {bob, charlie} 


5 Constraints 

RDF applications often impose constraints on the data graphs they process. 
A given graph either satisfies or violates the constraint. Thus a constraint 
partitions the set of all graphs into two disjoint subsets, namely the set of all 
graphs that satisfy the constraint and the set of all graphs that violate the 
constraint. A constraint is therefore defined by the set of graphs that satisfy it. 
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A constraint is a, possibly infinite, set of graphs. 


Constraint == P Graph 

For example, suppose we define a small graph to be a graph that has at most 
10 triples. The set of all small graphs is a constraint. 

smalCgraphs : Constraint 
small-graphs = { g ■ Graph \ ^g < 10 } 

The Alice graph satisfies this constraint, 
h alicc-graph G smalCgraphs 

5.1 Node Constraints 

A parameterized constraint is a mapping from some parameter set X to con¬ 
straints. 

ParameterizedConstraint[X] == X —> Constraint 

A term constraint is a constraint that is parameterized by terms. 

TermConstraint == ParameterizedConstraint[TERM] 

For example, given a term x G TERM, the constraint hasSuhject{x) is the 
set of all graphs that have a; as a subject. 

hasSubject : TermConstraint 
'ix : TERM • 

hasSubject(x) = { g ■ Graph \ x G subjects{g) } 

Similarly, hasPredicate{x), hasObject(x), and hasNode{x) are constraints 
with the analogous definitions. 

hasPredicate == (Xx : TERM • { g : Graph \ x G predicates(g )}) 
hasObject == {Xx : TERM • {g : Graph \ x G objects(g )}) 
hasNode == {Xx : TERM • {g : Graph \ x G nodes{g )}) 

Note that hasNode{x) is the union of hasSubject{x) and hasObject{x). 
h Vx : TERM • 

hasNode{x) = hasSubject{x) U hasObject{x) 

A node constraint is a term constraint in which the term is a node in each 
graph that satisfies the constraint. 
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NodeConstraint : P TermConstraint 
NodeConstraint = 

{ c : TermConstraint | Vx : TERM • V g : c{x) • x G nodes(g) } 
For example, hasNode is a node constraint, 
h hasNode G NodeConstraint 

The PIM application enforces the following node constraints. 

Both contact and associate nodes must be people. 

is-O-person : NodeConstraint 

Vx : TERM • 

is-a-person(x) = 

{ g : Graph \ (x, rdf-type, foaf-Person) G g} 

• A node is a person when it has a foaf: Person as one of its RDF types. 

For example, the Alice graph satisfies this constraint at the alice, bob, and 
Charlie nodes. 

h alice-graph G is-a-person{alice) A 
alice-graph G is-a-person{hoh) A 
alice-graph G is-a-person{charlie) 

Both contact and associate nodes must have exactly one name. 

has-one-name : NodeConstraint 

Wx : TERM • 

has-one-name{x) = 

{ g : Graph •. TERM • (x, foaf-name, y) G g} 

• A node has one name when it is the subject of exactly one foaf mamie 
triple. 

For example, the Alice graph satisfies this constraint at the alice, bob, and 
Charlie nodes. 

h alice-graph G has-one-name{alice) A 
alice-graph G has-one-name{bob) A 
alice-graph G has-one-name{charlie) 

Associate nodes must be known by exactly one node. 

is-known-by-one : NodeConstraint 

\fx : TERM • 

is-known-by-one{x) = 

{ g : Graph \3-^y •. TERM • {y, foaf-knows, x) G g} 
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• A node is known by one node when it is the object of exactly one f oaf : knows 
triple. 

For example, the Alice graph satisfies this constraint at the bob and charlie 
nodes. 

h alice^graph € is^knownJby_one(bob) A 
alice^graph G is^known^by_one{charlie) 

A contact node must satisfy the following constraint. 

contact-nc : Node Constraint 

yx : TERM • 

contact-nc{x) = 

is-a-person{x)r] 

has^one^name{x) 

• A contact is a person and has one name. 

The Alice graph satisfies this constraint at the alice, bob, and charlie nodes. 

h alice^graph G contact^nc{alice) A 
alicc-graph G contact-nc{bob) A 
alicc-graph G contact-nc{charlie) 

An associate node must satisfy the following constraint. 

associate^nc : Node Constraint 

yx : TERM • 

associate^nc{x) = 
is^a^person(x)r] 
has-one-name{x)C\ 
is-known-by_one (x) 

• An associate is a person, has one name, and is known by one node. 

The Alice graph satisfies this constraint at the bob and charlie nodes. 

h alicC-graph G associate-nc(bob) A 
alicc-graph G associate-nc{charlie) 


6 Shapes 

In general, a shape is any description of the expected contents of a graph. In 
this article we deal only with shapes that describe graphs using the following 
structure. A shape is a structure that defines how to associate a set of node 
constraints with each node of a data graph in two steps. 
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1. Label each node of the graph with a set of node constraint names using a 
set of neighbour functions. 

2. Map each name to a node constraint. 

These steps are described in detail below. 

Note that this definition of shape is very prescriptive about the labelling 
process but is completely independent of the details of both the neighbour func¬ 
tions and the node constraints. We speculate that the labelling process can be 
used to handle the recursive aspects of a wide variety of shape languages that 
differ only in their expressiveness for defining neighbour functions and node 
constraints. For example, Resource Shape 2.0 uses forward and backward path 
expressions as neighbour functions and has a small, fixed set of simple node con¬ 
straints. SHACL-SPARQL allows node constraints to be expressed by arbitrary 
SPARQL 1.1 queries, but does not allow explicit recursion. 

6.1 Labelling Data Graph Nodes with Constraint Names 

A shape contains a set of named constraints. A constraint may refer to other 
constraints by name. This means that a constraint may refer directly or indi¬ 
rectly to itself, in which case the constraint is recursive. 

Shapes themselves may be represented as RDF graphs, so it is tempting to 
use IRIs to name node constraints. However, we introduce a new given set of 
names to emphasize that this set is logically independent of how we represent 
shapes. 

[NAME] 

For example, there are two distinct kinds of node in the PIM application, 
namely contact and associate. 

contact, associate : NAME 
contact 7 ^ associate 

• contact and associate are distinct names. 

Since graphs appear in several roles, there is scope for confusion. To clarify 
its role, the graph to which constraints are being applied will be referred to as 
the data graph. 

The part of a shape that defines how data graph nodes are labelled is a 
neighbour graph. A neighbour graph is a directed, labelled, multigraph whose 
nodes are names and whose arcs are labelled by neighbour functions. 

_ Neighbour Graph _ 

names : F NAME 

arcs : F {NAME x Neighbour x NAME) 
arcs C names x Neighbour x names 
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• The nodes are names and the arcs are labelled by neighbour functions. 


For example, in the PIM application, contacts are related to associates by 
the knows forward path expression, and associates are related to contacts by 
the is^known^by backward path expression. 

pim^ng : Neighbour Graph 

pim-ug.names = {contact, associate} 

pim-ng.arcs = 

{{contact, knows, associate), 

{associate, isJknownNiy, contact)} 

A pointed neighbour graph consists of a neighbour graph and a base name in 
the graph. 

_ PointedNeighbour Graph _ 

Neighbour Graph 
baseName : NAME 

baseName G names 


• The base name belongs to the graph. 

The base name of a pointed neighbour graph is also referred to as the start 
name or focus name, depending on the context. 

For example, contact is the natural base name in the PIM application. 

pim-png : PointedNeighbour Graph 
pim^png .names = pim^ng .names 
pim-png .arcs = pim-ng.arcs 
pim-png.baseName = contact 

A named node is pair of the form {x, a) where a; is a data graph node and a 
is a node constraint name. 

NamedNode == TERM x NAME 

For example, {alice, contact) is named node, 
h {alice, contact) G NamedNode 

A data graph g and a neighbour graph ng define a requires binary relation 
requires{g, ng) on the set of named nodes. The meaning of this relation is that 
if {x, a) requires {y, b) then whenever x must satisfy the constraints named by 
a then y must satisfy the constraints named by b. 
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requires : Graph x Neighbour Graph —> {NamedNode > NamedNode) 

V g : Graph; ng : Neighbour Graph • 
requires{g, ng) = 

{x,y : nodes{g); a, b : NAME; q : Neighbour \ 

{a, q, b) € ng.arcs A 
{x,y) e q{g) • 

{x,a) {y,h)} 

• The named node {x, a) requires (y, b) when the neighbour graph includes 
an arc (a, q, b) and the node y can be reached from x by matching the 
neighbour function q in g. 

For example, the requires relation for the Alice graph in the PIM application 
is as follows. 

h requires{alice-graph,pim-ng) = 

{{alice, contact) i—> {bob, associate), 

{alice, contact) i—)■ {charlie, associate), 

{bob, associate) i—)■ {alice, contact), 

{charlie, associate) i—>■ {alice, contact)} 

A labelled graph is a data graph whose nodes are each labelled by a, possibly 
empty, set of names. 

_ LabelledGraph _ 

graph : Graph 

names : F NAME 

label : TERM F NAME 

label G nodes{graph) — > F names 


• Each node in the graph is labelled by a set of names. 

For example, the following is a labelled graph based on the Alice graph. 

aliceMg : LabelledGraph 

aliceMg. graph = alicc-graph 

aliceMg.names = {contact, associate} 

aliceMg .label = 

{alice I—>■ {contact}, 
bob I—> {associate}, 
charlie i—>■ {associate}, 

Alice I—> 0, 

Bob i-A 0, 

Gharlie i—>■ 0, 
foaf-Person i—)■ 0} 
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A pointed graph and a pointed neighbour graph determine a unique la¬ 
belled graph. Intuitively, the labelling process starts by labelling the base node 
with the base name. Next, the neighbour graph is checked for arcs that be¬ 
gin at baseName, e.g. (baseName, q, b). For each such arc compute the set 
values{g, q, baseNode) and for each node y in this set, label y with b. Now re¬ 
peat these steps taking y as the new base node and b as the new base name, but 
only do this once for each named node (y, b). Since there are a finite number of 
nodes and a finite number of names, this process always terminates. 

_ LabelGraph _ 

PointedGraph 
PointedNeighbour Graph 
LabelledGraph 

let ng == 9 Neighbour Graph • 

let R == {requires{graph, ng))* • 
label = {Xy : nodes {graph) • 

{ b : names \ {baseNode, baseName) R {y, b) }) 


• The label of a node y is the set of names b such that the named node 
{y, b) is related to the base named node {baseNode, baseName) by R the 
reflexive-transitive closure of the requires relation requires{graph, shape). 

• Note that this labelling process makes use of R, the reflexive-transitive 
closure of the requires relation. The use of R avoids difficulties associated 
with explicitly recursive definitions. We have, in effect, eliminated explicit 
recursion by computing a transitive closure of a finite binary relation. 

• Note that the components of LabelledGraph are uniquely determined by 
the components of PointedGraph and PointedNeighbour Graph. 

For example, the pointed graph alice-pg and the pointed neighbour graph 
pim^png uniquely determine the labelled graph aliceNg. 

V LabelGraph \ 

OPointedGraph = alice-pg A 
9 PointedNeighbour Graph = pim-png • 

9 LabelledGraph = aliceNg 

6.2 Mapping Constraint Names to Node Constraints 

The association of node constraints to graph nodes is given by a mapping. 

_ NodeConstraints _ 

names : F NAME 

constraint : NAME -i-> NodeConstraint 
dom constraint = names 
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• Each name maps to a node constraint. 


For example, the PIM application associates the following node constraints 
with names. 

pim-ucs : NodeConstraints 

pirri-ncs.names = {contact, associate} 

pim-ncs. constraint = 

{contact I—)■ contact-nc, 
associate i—)■ associate^nc} 

• The PIM application has two kinds of nodes, named contact and associate. 

• contact nodes must satisfy the contact-nc node constraint. 

• associate nodes must satisfy the associatc-nc node constraint. 

A constrained graph is an assignment of a, possibly empty, set of node con¬ 
straints to each node of the graph. 

_ ConstrainedGraph _ 

graph : Graph 

constraints : TERM -i-> F NodeGonstraint 
dom constraints = nodes{graph) 


• Each node of the data graph has a set of node constraints. 

For example, the PIM application enforces the following constraints on the 
Alice graph. 

alice^cg : GonstrainedGraph 

alicc-cg. graph = alice-graph 

alicc-cg. constraints = 

{alice I—>■ {contact^nc}, 
bob I—>■ {associate^nc}, 

Charlie i—>■ {associatc-nc}, 

Alice !-)■ 0, 

Bob I-7- 0, 

Gharlie i—)■ 0, 
foaf ^Person i—> 0} 

• alice must satisfy the contact node constraint. 

• bob and charlie must satisfy the associate node constraint. 

• There are no node constraints on the remaining nodes. 
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A constrained graph is valid if it satisfies all the constraints at each node. 


_ ValidGraph _ 

ConstrainedGraph 

Va; : nodes{graph) • 

V c : constraints(x) • 
graph € c(x) 


• A valid data graph satisfies each node constraint at each node. 

For example, the constrained graph alice^cg is valid, 
h alice^cg G ValidGraph 

A mapping from nodes to names {LabelledGraph) and a mapping from names 
to node constraints (NodeGonstraints) uniquely determines a mapping from 
nodes to node constraints (GonstrainedGraph). 

_ GonstrainGraph _ 

LabelledGraph 

NodeGonstraints 

GonstrainedGraph 

VX : nodes{graph) • 
constraints (x) = 

{ a : label{x) • constraint(a) } 


• The set of node constraints at each node x of a labelled data graph is 
equal to the to the set node constraints named by the labels a at x. 

• Note that the components of GonstrainedGraph are uniquely determined 
by the components of LabelledGraph and NodeGonstraints. 

For example, the Alice labelled graph aliee-lg and the PIM node constraints 
pim-ucs uniquely determine the Alice constrained graph alice-cg. 

h V GonstrainGraph \ 

9LabelledGraph = alieeNg A 
0NodeGonstraints = pim^ncs • 

9 GonstrainedGraph = aliee-eg 

6.3 Shapes as Constraints 

A shape consists of a neighbour graph and node constraints. 

_ Shape _ 

Neighbour Graph 
NodeGonstraints 
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For example, the neighbour graph pirri-ng and the node constraints pirri-nc 
define a shape for the PIM application. 

pimshape : Shape 

pimshape.names = {contact, associate} 
pimshape.arcs = pim-ng.arcs 
pim^shape. constraint = pim^ncs. constraint 

A pointed shape consists of a pointed neighbour graph and node constraints. 

_ PointedShape _ 

PointedNeighbour Graph 
NodeConstraints 


For example, the pointed neighbour graph pirn—png, which has base name 
contact, and the node constraints pim^ncs define a pointed shape for the PIM 
application. 

pim-ps : PointedShape 

pim^ps.names = [contact, associate} 

pim^ps.baseName = contact 

pim-ps.arcs = pim-ng.arcs 

pim-ps. constraint = pim_ncs. constraint 

A pointed data graph satisfies a pointed shape if the constrained graph 
produced by the composition of the labelling and constraining processes is valid. 

_ SatisfiesShape _ 

PointedShape 

PointedGraph 

LabelGraph 

GonstrainGraph 

ValidGraph 


• The constrained graph that results from the labelling and constraining 
processes must be valid. 

• Note that the components of LabelGraph and GonstrainGraph are uniquely 
determined by the components of PointedShape and PointedGraph. The 
validity condition ( ValidGraph) therefore determines a relation between 
PointedShape and PointedGraph. The pointed graph is said to satisfy the 
pointed shape. 
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For example, the pointed Alice graph satisfies the pointed PIM shape. 


h 3]^ Satisfies Shape • 

9PointedShape = pim^ps A 
OPointedGraph = alice-pg 

A pointed shape determines a node constraint. 

shapeConstraint : PointedShape —> NodeConstraint 

Wps : PointedShape] x : TERM • 
shapeConstraint{ps){x) = 

{ SatisfiesShape \ ps = 9PointedShape A x = baseNode • graph } 

• Given a pointed shape ps, shapeConstraint{ps) is a node constraint. Given 
a node x, shapeConstraint (ps) (x) is the set of all graphs graph such that 
the pointed graph formed by using x as the base node satisfies ps. 

For example, the pointed PIM shape defines a node constraint. 
pim-uc : NodeConstraint 
pim-uc = shapeConstraint{pim-ps) 

The graph alice-graph satisfies this node constraint at the node alice. 
h alicc-graph € pim-nc{alice) 


7 Relation to Shape Languages 

This section discusses how the preceding formalism relates to Resource Shape 
2.0, ShEx, and SHAGL. 

7.1 Relation to Resource Shape 2.0 

Resource Shape 2.0 provides a small vocabulary for defining simple, commonly 
occurring node constraints such as property occurrence, range, and allowed 
values. These are uncontentious and will not be discussed further. 

As mentioned above. Resource Shape 2.0 also allows recursive shapes via 
the property oslc: valueShape. The preceding formalism was motivated by 
Resource Shape 2.0 and, not surprisingly, provides a precise description of the 
meaning of recursive shapes in that language. 

7.1.1 Example: Personal Information Management 

The following listings illustrate the use of oslc: valueShape for the running 
PIM example. 

Listing 5 contains the resource shape for contacts. Note that Resource Shape 
2.0 is incapable of expressing the constraint that a contact must not have itself 
as an associate. 
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1 # http://example.org/shapes/contact 

2 ©prefix foaf: <http://xmlns.eom/foaf/0.l/> . 

3 ©prefix oslc: <http://open - services.net/ns/core#> 

4 ©prefix xsd: <http://www.w3.Org/2001/XMLSchema#>. 

5 ©base <http://example.org/shapes/> . 

6 

7 <contact> a oslc:ResourceShape ; 

8 oslc:describes foaf:Person ; 

9 oslc:property 

10 <contact#name> , 

11 <contact#knows> . 

12 

13 <contact#name> a oslc:Property ; 

14 oslc:name "name" ; 

15 oslc:occurs oslc:Exactly-one ; 

16 oslc:propertyDefinition foaf:name ; 

17 oslc:valueType xsd:string . 

18 

19 <contact#knows> a oslc:Property ; 

20 oslc:name "knows" ; 

21 oslc:occurs oslc:Zero-or-many ; 

22 oslc:propertyDefinition foaf:knows ; 

23 oslc:range foaf:Person ; 

24 oslc:valueShape <associate> . 

Listing 5: Resource shape for contacts 

Listing 6 contains the resource shape for associates. 

1 # http://example.org/shapes/associate 

2 ©prefix foaf: <http://xmlns.eom/foaf/0.l/> . 

3 ©prefix oslc: <http://open - services.net/ns/core#> 

4 ©prefix xsd: <http://www.w3.Org/2001/XMLSchema#>. 

5 ©base <http://example.org/shapes/> . 

6 

7 <associate> a oslc:ResourceShape ; 

8 oslc:describes foaf:Person ; 

9 oslc:property 

10 <associate#name> , 

11 <associate#isKnownBy> . 

12 

13 <associate#name> a oslc:Property ; 

14 oslc:name "name" ; 

15 oslc:occurs oslc:Exactly-one ; 

16 oslc:propertyDefinition foaf:name ; 

17 oslc:valueType xsd:string . 

18 

19 <associate#isKnownBy> a oslc:Property ; 

20 oslc:name "isKnownBy" ; 

21 oslc:occurs oslc:Exactly-one ; 

22 oslc:propertyDefinition foaf:knows ; 
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23 

24 

25 


oslc:isInverseProperty true ; 
oslc:range foaf:Persoii ; 
oslc:valueShape <contact> . 

Listing 6: Resource shape for associates 

This example illustrates recursive shapes since the contact shape refers to 
the associate shape and the associate shape refers to the contact shape. This 
apparent circularity would cause difficulty if contact and associate were each 
described as a constraint. However, using the preceding formalism, the compos¬ 
ite shape consisting of both the contact and associate resource shapes is given 
a well-defined meaning. 

In Resource Shape 2.0, neighbour functions are limited to forward and back¬ 
ward path expressions. In fact, backward path expressions were missing from 
the original OSLC Resource Shape language and are a proposed extension in 
Resource Shape 2.0. The proposed syntax for backward path expressions uses 
the optional property oslc: isInverseProperty but this design could be im¬ 
proved to provide better compatibility with downlevel clients, i.e. a downlevel 
client might silently ignore this new property and produce incorrect results. 

The following SPARQL query extracts the neighbour graph arcs from a set 
of resource shapes. Each binding of (?a ?direction ?p ?b) corresponds to an 
arc (a, q, b) where q = forward{p) or g = backward{p). 


1 
2 

3 

4 

5 

6 

7 

8 
9 

10 
11 
12 
13 

Listing 7: Query for neighbour graph arcs 

The result of running this query on the PIM resource shapes is given in 
Table 4. 


a 

direction 

P 

b 

<contact> 

<associate> 

"forward" 

"backward" 

foaf:knows 

foaf:knows 

<associate> 

<contact> 


Table 4: Result of query on PIM shape 
The query correctly extracts the neighbour graph pim^ng of the PIM shape. 


prefix oslc: <http://open-services.net/ns/core#> 

select distinct ?a Tdirection ?p ?b 
where { 

?a a oslc:ResourceShape ; 

oslc:property ?prop . 

?prop a oslc:Property ; 

oslc:propertyDefinition ?p ; 
oslc:valueShape ?b . 

optional {?prop oslc:isInverseProperty Tinverse} 
bind (if(bound(?inverse) && Tinverse, 

’backward’, ’forward’) as Tdirection) 
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7.1.2 Example: Polentoni 

Listing 8 contains the resource shape for Polentoni. 

1 # http://example.org/shapes/polentoni 

2 ©prefix ex: <http://example. 0 rg/polentoni#> . 

3 ©prefix oslc: <http://open - services.net/ns/core#> . 

4 ©base <http://example.org/shapes/> . 

5 

6 <polentoni> a oslc:ResourceShape ; 

7 oslc:property 

8 <polentoni#lives In> , 

9 <polentoni#knows> . 

10 

11 <polentoni#livesIn> a oslc:Property ; 

12 oslc:name "livesin" ; 

13 oslc:occurs oslc:Exactly-one ; 

14 oslc:propertyDefinition ex:livesin ; 

15 oslc:allowedValue ex:Northernitaly . 

16 

17 <polentoni#knows> a oslc:Property ; 

18 oslc:name "knows" ; 

19 oslc:occurs oslc:Zero-or-many ; 

20 oslc:propertyDefinition ex:knows ; 

21 oslc:valueShape <polentoni> . 

Listing 8: Resource shape for Polentoni 

This resource shape is clearly recursive since it refers to itself. However, as 
shown above, this recursion has a well-defined meaning and causes no difficulties. 

7.2 Relation to ShEx 

One major difference between Resource Shape 2.0 and ShEx is that ShEx al¬ 
lows the definition of much richer node constraints using regular expressions, 
disjunction, and other operations. ShEx also allows recursion via reference to 
named shapes, e.g. @<UserShape>, which is referred to as the ValueRef erence 
feature. 

Most features of Resource Shape 2.0 can be expressed in ShEx. The intersec¬ 
tion of Resource Shape 2.0 and ShEx certainly includes the recursive aspects of 
Resource Shape 2.0 as expressed by oslc: valueShape. Therefore the preceding 
formalism applies to ShEx provided that ValueRef erence is used in a way that 
maps directly to Resource Shape 2.0. 

However, ShEx allows a more permissive use ValueRef erence. Eor example, 
a ValueRef erence may appear inside GroupRule with cardinalities. It is not 
clear that this usage can be expressed using suitably defined neighbour functions. 
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7.3 Relation to SHACL 

The W3C Data Shapes Working Groups is currently developing the SHACL 
specification. At the time of writing, there are three competing proposals. One 
proposal, influenced by SPIN, appears to treat recursion similarly to Resource 
Shape 2.0. A second proposal is a further development of ShEx. The third 
proposal, SHACL-SPARQL, takes a different approach by avoiding recursion 
entirely. 

The ability to name and refer to shapes allows for more intuitive descriptions 
of the constraints on graphs. The absence of recursion in SHACL-SPARQL 
therefore detracts from its usefulness. However, since the formalism presented 
here gives a clear and unobjectionable meaning to a limited form of recursion, 
this capability could be added to SHACL-SPARQL. 

8 Conclusion 

The formalism presented here gives a precise meaning to recursive shapes as 
defined in Resource Shape 2.0. 

This formalism is applicable to a subset of ShEx in which recursion is suitably 
limited. More analysis is required in order to determine if the unlimited form 
of recursion allowed in ShEx and its SHACL follow-on can be described using 
suitable neighbour functions, or if some new concept is required. 

Einally, the limited form of recursion presented here could be added to the 
SHACL-SPARQL proposal to enhance its expressiveness. 
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